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Abstract 

In this paper we study a Markov Chain Monte Carlo (MCMC) Gibbs sampler for solv- 
ing the integer least-squares problem. In digital communication the problem is equivalent to 
performing Maximum Likelihood (ML) detection in Multiple-Input Multiple-Output (MIMO) 
systems. While the use of MCMC methods for such problems has akeady been proposed, our 
method is novel in that we optimize the "temperature" parameter so that in steady state, i.e. 
after the Markov chain has mixed, there is only polynomially (rather than exponentially) small 
probability of encountering the optimal solution. More precisely, we obtain the largest value 
of the temperature parameter for this to occur, since the higher the temperature, the faster the 
mixing. This is in contrast to simulated annealing techniques where, rather than being held 
fixed, the temperature parameter is tended to zero. Simulations suggest that the resulting Gibbs 
sampler provides a computationally efficient way of achieving approximative ML detection in 
MIMO systems having a huge number of transmit and receive dimensions. In fact, they further 
suggest that the Markov chain is rapidly mixing. Thus, it has been observed that even in cases 
were ML detection using, e.g. sphere decoding becomes infeasible, the Gibbs sampler can still 
offer a near-optimal solution using much less computations. 

I. Introduction 

The problem of performing Maximum Likelihood (ML) decoding in digital communication has 
gained much attention over the years. One method to obtain the ML solution is Sphere Decoding 
(SD) [l]-[5]. Over a wide range of Signal-to-Noise Ratios (SNR)s the average complexity of SD 
is significantly smaller than exhaustive search detectors, but in worst case the complexity is still 
exponential [6]. Thus, in scenarios with poor SNR or in Multiple-Input Multiple-Output (MIMO) 
systems with huge transmit and receive dimensions, even SD can be infeasible. A way to overcome 
this problem is to use approximate Markov Chain Monte Carlo (MCMC) detectors instead, which 
asymptotically can provide the optimal solution, [7], [8]. Gibbs sampling (also known as Glauber 
dynamics) is one MCMC method, which is used for sampling from distributions of multiple 



dimensions. The Gibbs sampler has among others been proposed for detection purposes in wireless 
communication in [9]-[12] (see also the references therein). The scope of this paper is to describe 
and analyse a new way of solving the integer least-squares problem using MCMC. It will be 
shown that the method can be used for achieving a near-optimal and computationally efficient 
solution of the problem, even for systems having a huge dimension. 

The paper is organized as follows; In Section II we present the system model that will be 
used throughout the paper. The MCMC method is described in Section III and in Section IV 
we analyse the probability of error for the ML detector. Section V treats the optimal selection 
of the temperature parameter a, while the simulation results are given in Section VI and some 
concluding remarks are found in Section VII. 

II. System Model 

We consider a real-valued block-fading MIMO antenna system, with N transmit and N receive 
dimensions, with know channel coefficients.' The received signal y G can be expressed as 



^Hs + „, (1, 

where s € is the transmitted signal, and denotes the constellation set. To simplify the 



derivations in the paper we will assume that = {±1}. v G M is the noise vector where 
each entry is Gaussian 7V(0, 1) and independent identically distributed (i.i.d.), and H € M^^^ 
denotes the channel matrix with i.i.d. A/'(0, 1) entries. The normalization in (1) guarantees that 
SNR represents the signal-to-noise ratio per receive dimension (which we define as the ratio 
of the total transmit energy per channel use divided by the per-component noise variance as 
described in among others [5]). As explained further below, for analysis purposes we will focus 
on the regime where SNR > 21n(A^), in order to get the probability of error of the ML detector 
to go to zero. Further, in our analysis, without loss of generality, we will assume that the all 
minus one vector was transmitted, s = — 1. Therefore 



/SNR 

y = --V^Hi. (2) 

We are considering a minimization of the average error probability P (e) = P (s 7^ s), which 
is obtained by performing Maximum Likelihood Sequence Detection (here simply referred to as 
ML detection) given by 
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III. Gibbs Sampling 



(3) 



One way of solving the optimization problem given in (3) is by using Markov Chain Monte 
Carlo (MCMC) simulations, which asymptotically converge to the optimal solution [13]. More 
specifically, the MCMC detector we investigate here is the Gibbs sampler, which computes the 
conditional probability of each symbol in the constellation set at the jth index in the estimated 



'For simplicity we liave assumed that the receive and the transmit dimensions are the same, but the results presented 
in the paper can be generalized to cover different dimensions. 



symbol vector. This conditional probability is obtained by keeping the j — 1 other values in the 
estimated symbol vector fixed. Thus, in kth iteration the probability of the jth symbol adopts the 
value ijj, is given as 
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(4) 



where sJJ^ 



Jk-i) 



and where we for simplicity have introduced 6 = {s^'^"^) , y, H} } 
a represents a tunable positive parameter which controls the mixing time of the Markov chain, 
this parameter is also sometimes called the "temperature". The larger a is the faster the mixing 
time of the Markov chain will be, but as we will show in the paper, there is an upper limit on 
a, in order to ensure that the probability of finding the optimal solution in steady state is not 
exponentially small. The MCMC method will with probability p (^s^^^^ = uj\9^ keep uj at the j'th 
index in estimated symbol vector, and compute conditional probability the {j + l)th index in a 
similar fashion. We define one iteration of the Gibbs sampler as a randomly-ordered update of all 
the j = {1, . . . , Nt} indices in the estimated symbol vector s.-^ The initialization of the symbol 
vector s^'^) can either be chosen randomly or, alternatively, e.g. the zero-forcing solution can be 
used. 

A. Complexity of the Gibbs sampler 

The conditional probability for the j'th symbol in (4) can be computed efficiently by reusing 



the result obtained for the j — I'th symbol, when we evaluate 

are only changing the j'th symbol in the symbol vector, the difference dj 
can be expressed as 



y - VSNR/iVHs 



. Since we 
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(5) 



where As,-|,^. = s^^) —s^'f- 

j\uj 



. Thus, the computation of conditional probability of certain symbol 
in the j'th position costs 2N operations, where we define an operation as a Multiply and 
Accumulate (MAC) instruction.'* This leads to a complexity of O (2A^^[|J7| — 1]) operations 
per iteration. For further details on the implementation of the Gibbs sampler see [14]. 



When we compute the probabihty of symbol to at the j'th position, we more precisely condition on the symbols 
Si.j_i and s^_(.j.j(r^, but to keep the notation simple, we do not explicitly state that in the equations above. 

'We need a randomly-ordered update for the Markov chain to be reversible and for our subsequent analysis to go 
through. It is also possible to just randomly select a symbol j to update, without insisting that a full sequence be 
done. This also makes the Markov chain reversible and has the same steady state distribution. In practice a fixed, 
say sequential, order can be employed, although the Markov chain is no longer reversible. Note that our theoretical 
analysis is assuming randomly selected symbol updates for analytical convenience. In our experimental section we 
used a sequential updating order which empirically yields a slight convergence acceleration. 

■*We need to compute both the inner product djdj and the product Hi:jv,j As^j;^ . 



IV. Probability of Error 



In this paper, we are interested in evaluating the performance of the aforementioned Gibbs 
sampler, compared to the ML solution. To ease our analysis, we will assume that the ML detector 
finds the correct transmitted vector. Before we derive the probability of error for the ML detector, 
we will state a lemma which we will make repeated use of. 

Lemma IV.l (Gaussian Integral). Let v and x be independent Gaussian random vectors with 
distribution AA(0,lAr) each. Then, if I — 2a^r/(l + 2r/) > 0, 



^ fgr,(||v+ax||^-||v||^)l ^ ( ^ \ 
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Proof: See Appendix VIII- A for a detailed proof. ■ 

Assuming that the vector s = — 1 was transmitted, the ML detector will make an error if there 
exists a vector s ^ —1 such that 
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In other words. 
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for some s / — 1, which can be formulated as 
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for some S ^ 0. Note that in the above equation 5 is a vector of zeros and — I's. Now using the 
union bound 
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We will use the Chemoff bound to bound the quantity inside the summation. Thus, 
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where /3 > is th e Chemoff parameter, and where we have used Lemma IV. 1 with -q = —(3 and 

o /SNR||<5||2 . 
a = l\l " , smce 
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The optimal value for /3 is i, which yields the tightest bound 
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Note that this depends only on the number of nonzero entries in S. Plugging this into the 

union bound yields 
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Let us first look at the linear (i.e., i proportional to N) terms in the above sum. Thus, 
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where H{-) is entropy in "nats". Clearly, if limTv^oo SNR = oo, then the linear terms go to zero 
(superexponentially fast). 

Let us now look at the sublinear terms. In particular, let is look at i = 1: 
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Clearly, to have this term go to zero, we require that SNR > 2 In N. A similar argument shows 
that all other sublinear terms also go to zero, and so.^ 

Lemma IV.2 (SNR scahng). // SNR > 2 In N, then Pe ^ as N ^ oo. 

^Due to space constraints we only present a sketch of this bound. A rigorous proof can be given using the saddle 
point method, similarly to the proof in the next section. 



V. Computing the optimal a 

Assuming that the vector s = — 1 has been transmitted, the probability of finding this solution 
after the Markov chain has mixed is simply 7r_i, the steady-state probability of being in the all 
—1 state. Clearly, if this probability is exponentially small, it will take exponentially long for 
the Gibbs sampler to find it. We will therefore insist that the mean of 7r_i be only polynomially 
small. 



A. Mean o/tt-i 

This calculation has a lot in common with the one given in Section IV. Note that the steady 
state value of 7r_i is simply 
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where 6 is a vector of zeros and ones and the summations (over s and S) are over 2" terms. 



Now, by Jensen's inequality 
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In (12d) we have used Lemma IV. 1 and in (12e) we have defined /5 = 4SNR^(1 — While 
it is possible to focus on the linear and sublinear terms in the above summation separately, 
to give conditions for E {it-i} to have the form of l/poly(A^), we will be interested in the 
exact exponent and so will need a more accurate estimate. To do this we shall use saddle point 
integration. Note that 
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where again represents the entropy in "nats". And so the summation in the denominator of 
(12e) can be approximated as a Stieltjes integral: 
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For large N, this is a saddle point integral and can be approximated by the formula 

aNf{x)^^ _ , / 27r ^Nf(xo) 
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where xq, is the saddle point of /(•), i.e.,/'(xo) = 0. In our case, 

f{x) = —X In X — (1 — x) ln(l — x) — - ln(l + /?x) 

and so 

f (x) = In 



2 1 + /3x 

In general, it is not possible to solve for /'(xq) = in closed form. However, in our case, 
if we assume that (3 = 4SNR^(1 — ^) » 1 (which is true since the SNR grows at least 
logarithmically), then it is not too hard to verify that the saddle point is given by 
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xo = e 2 . (15) 



And hence /(xq) 



e 2 In e 2 — (^i — e 2 j in(l — e ^ ) ~ 2 + 2 
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and further plugging xq into f"{x) = -^-j^- yi^^^^ 



/"(xo) P« -ef - 1 + « -ef . (16) 
Replacing these into the saddle point expression in (14) show that 

We want E {vr_i} to behave as and according to (12) this means that we want the expression 
in (17) to behave as N'^. Let us take 



Solving for (3 yields 



/3 = 4SNR^ ( 1-^ ) = 2(lniV-lnlniV-lnC). (18) 



Incidentally, this choice of (3 yields e 4 « and so we have the following result. 
Lemma V.l (Mean of vr_i). If a is chosen such that 
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then 
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Figure 1: Value of a vs. SNR for system size N = 10. 

B. Value of a 

Note that from (12e) it is clear that the larger (3 is, the larger 7r_i is. Therefore, the range of 
a that gives a polynomially small probability to 7r_i is 

2SNR 

1-4 " IniV - InlniV - \nC' ^^^"^ 

It can be shown that in the regime, SNR > 2 In A^, the above quadratic inequality in a has two 
positive real solutions, a+ > a_, and that the inequality holds for all a G [a„,a+]. 

We know that, the larger a is, the faster the Markov chain mixes. ^ Therefore it is reasonable 
that we choose the largest permissible value for a, i.e., a+. 

Figures 1 and 2 show the values of a+ and a_ as a function of SNR for systems with = 10 
and N = 50, when we have C, = l/\n.N. 





















SNR = 2ln(N) 



























































































































0.5 1 ' ' ' ' = ^ ' ' ' 

2 4 6 8 10 12 14 16 

SNR [dB] 



Figure 2: Value of a vs. SNR for system size = 50. 



*In general, there is a trade-off between faster mixing time of the Markov chain (due to an increase of a) versus 
slower encountering the optimal solution in steady-state. In fact, at infinite temperature our algorithm reduces to a 
random walk in a hypercube which mixes in 0{N In A'^) time. 
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Figure 4: BER vs. iterations, 10 x 10 system. SNR - 14 dB. 



C. Mixing time of Markov Chain 

One open question is whether the Markov chain is rapidly mixing when using the strategy 
above for choosing a. This is something we are currently investigating, and the simulations 
presented in Section VI seem to indicate that this is the case. Furthermore, the simulations also 
suggest that the computed value of a is very close to the optimal choice, even in the case where 
the condition SNR > 2ln{N) is not satisfied. 

VI. Simulation Results 

In this section we present simulation results for a MIMO N x N system with a full square 
channel matrix containing i.i.d. Gaussian entries. In Fig. 3 and Fig. 4 the Bit Error Rate (BER) 
of the Gibbs sampler, initialized with a random s, has been evaluated as a function of the 
number of iterations in a 10 x 10 system using a variety of a values. Thereby, we can inspect 
how the parameter a affects the convergence rate of the Gibbs sampler. The performance of the 
Maximum Likelihood (ML), the Zero-Forcing (ZF), and the Linear Minimum Mean Square Error 
(LMMSE) detector has also been plotted, to ease the comparison of the Gibbs sampler with these. 
It is seen that the Gibbs sampler outperforms both the ZF and the LMMSE detector after only a 
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Figure 5: BER vs. SNR, 10 x 10. Number of iterations, k = 100. 

few iterations in all the presented simulations, when the tuning parameter a is chosen properly. 
Furthermore, it is observed that the parameter a has a huge influence on the convergence rate 
and that the Gibbs sampler converges toward the ML solution as a function of the iterations.^ 
The optimal value of a (in terms of convergence rate) is quite close to the theoretical values from 
Fig. 1 of a+ = 2.7 and a+ = 4.6 at SNR's at 10 and lAdB, respectively. It is also observed that 
the performance of the Gibbs sampler is significantly deteriorated if the temperature parameter 
is chosen based on the SNR (and thereby on the noise variance), such that a = a = 1/SNR. 
Thus, the latter strategy is clearly not a wise choice. 

Figure 5 shows the BER performance for the MCMC detector for fixed number of iterations, 
k = 100. From the figure we see that the SNR has a significant influence on the optimal choice 
of a given a fixed number of iterations. 

The performance of the Gibbs sampler is also shown for a 50 x 50 system, which represents 
a ML decoding problem of huge complexity where an exhaustive search would require 2^*^ f« 
10^^ evaluations. For this problem even the sphere decoder has an enormous complexity under 
moderate SNR.^ Therefore, it has not been possible to simulate the performance of this decoder 
within a reasonable time and we have therefore "cheated" a little by initializing the radius of 
the sphere to the minimum of either the norm of the transmitted symbol vector or the solution 
found by the Gibbs sampler. This has been done in order to evaluate the BER performance of 
the optimal detector. Figure 6 shows the BER curve as a function of the iteration number, while 
Figure 7 illustrates the BER curve vs. the SNR. From Figure 6 we see that there is a quite 
good correspondence between the simulated a and the theoretical value a+ = 2.6 obtained from 
Figure 2. The average complexity (MAC pr. symbol vector) of the Gibbs sampler having a BER 
performance comparable with the ML detector is shown in Table I. The SD has been included 

'it should be noted that the way we decode the symbol vector to a given iteration, is to select the symbol vector 
which has the lowest cost function in all the iterations up to that point in time. 

^In fact, it can be shown that, for SNR = C'(lnA'^), the lower bound on the complexity of the sphere decoder 
obtained in [6] is exponential. 
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Figure 6: BER vs. iterations, 50 x 50 system. SNR = 12 dB. 
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Figure 7: BER vs. SNR, 50 x 50 system. Num. of iter., k = 500. 



Table I: Complexity of SD and Gibbs Sampler (GS). 
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Method 
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10 dB 


14 dB 


10 


GS 
SD 


9.8 
10.0 


• 10^ 
■ 10^ 


10.9 • 10^ 
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> 1.9- 10^ 


37.7 ■ 10^ 



as a reference.^ It is observed that the complexity of the Gibbs sampler is not affected by the 
SNR as much as the SD. 

VII. Conclusion 

In this paper we considered solving the integer least-squares problem using Monte Carlo 
Markov Chain Gibbs sampling. The novelty of the proposed MCMC method is that, unlike 
simulated annealing techniques, we have a fixed temperature parameter in all the iterations, with 
the property that after the Markov chain has mixed, the probability of encountering the optimal 



'it has not been possible to simulate the SD for a 50 x 50 system when SNR < lOdB and, therefore, the complexity 
of SNR — 12dB has been used a lower bound. 



solution is only polynomial small (i.e. not exponentially small). We further compute the optimal 
(here largest) value of the temperature parameter that guarantees this. Simulation results indicate 
the sensitivity of the method to the choice of the temperature parameter and show that our 
computed value gives a very good approximation to its optimal value. Investigating whether the 
Markov chain mixes in polynomial time for this choice of temperature parameter is currently 
under investigation. 

VIII. Appendix 

A. Proving Lemma IV.l 

Lemma IV.l (Gaussian Integral) Let v and x be independent Gaussian random vectors with 
distribution A/'(0,l7v) each. Then 
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Proof: In order to determine the expected value we compute the multivariate integral 
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Thus, Lemma IV.l has hereby been proved. 
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